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INTRODUCTION 

This  Final  Technical  Report  constitutes  a  summary  of  the  research  performed  under 

Grant  AFOSR-86-0026  during  the  period  November  I,  1986  through  April  30,  1990, 

First  we  present  a  list  of  the  personnel  involved  in  the  research  effort.  Then  in  the 

following  section  we  present  a  brief  summary  of  the  research  results  that  have  been 

achieved.  Each  of  these  results  is  well  documented  in  technical  articles,  and  references  to 

these  articles  are  made  in  the  summary  of  the  research  results.  i*7~— - 

1  Acceston  For 


NTIS 

DTl'J 
U:,.;  • 
Jo:,'.  1. 

By 
IV i 


FRA. 5,  | 
7  a?. 

■■  '.M 


d 

□ 


I  A 

r — 

i 

Dist 

H 


A  v-  * 


I 


or 


i 


PERSONNEL 


Principal  Investigator: 

Gary  L.  Wise 

Postdoctoral  Research  Associates: 
John  M.  Morrison 
Alan  E.  Wcssel 
Graduate  Research  Associate: 

Eric  B.  Hall 


A  SURVEY  OF  RESULTS 


In  this  Final  technical  report  we  will  briefly  comment  upon  our  research 
accomplishments  sponsored  by  the  Grant  AFOSR-86-026.  Much  of  our  work  during  this 
period  was  concerned  with  various  aspects  of  estimation  theory.  Additional  work  was 
done  in  the  areas  of  contention  resolution  for  local  area  computer  networks,  signal  detection 
theory,  data  compression  for  image  processing,  and  linear  system  theory. 

Typically,  the  problem  of  estimation  is  concerned  with  attempting  to  approximate  a 
desired  quantity  by  a  function  of  the  available  data  so  as  to  minimize  a  prescribed  fidelity 
criterion.  A  commonplace  example  might  be  given  by  attempting  to  estimate  a  second  order 
random  variable  X  (perhaps  a  signal  of  interest)  by  some  function  f(  • )  of  the  datum  Y 
(perhaps  a  noise  corrupted  version  of  the  signal)  so  as  to  minimize  the  mean  square  error 
E([X  -  f(Y)]2).  This  example  appears  in  many  works  on  the  subject  of  estimation  theory. 
In  earlier  work,  sponsored  by  a  previous  AFOSR  Grant,  we  showed  [1]  that  the  best  such 
function  is  not  necessarily  given  by  f(Y)  =  E(X  I  Y),  even  thought  X  and  Y  are  both 
bounded  random  variables.  Moreover,  it  might  seem  that  there  is  little  justification  from  a 
practical  viewpoint  of  choosing  the  mean  square  error  as  the  appropriate  fidelity  criterion. 
Consider  a  fidelity  criterion  given  by  the  expectation  of  a  cost  function  of  the  error.  In  the 
context  of  estimation  theory,  one  is  often  confronted  with  two  concerns  in  choosing  a  cost 
function:  the  concern  that  the  cost  function  adequately  reflects  the  cost  one  wishes  to  attach 
to  an  error,  and  the  concern  that  the  cost  function  results  in  a  problem  which  one  finds  to  be 
mathematically  tractable.  A  cursory  inspection  of  the  literature  in  estimation  theory  might 
suggest  that  in  many  cases  the  second  of  the  above  concerns  totally  eclipses  the  first 
concern.  We  began  a  serious  study  of  estimation  theory.  This  work  was  directed  to  the 
very  underpinnings  of  estimation  theory,  and  it  is  representative  of  what  in  many  cases  in 
the  literature  is  ignored,  is  postulated  with  no  concent  for  the  consistency  of  everything 
being  postulated,  or  is  otherwise  swept  under  the  rug.  The  two  such  areas  in  which  we 
have  achieved  some  success  are  concerned  with  continuity  properties  of  filtrations  of  C- 
algebras  generated  by  stochastic  processes  and  witlt  the  convergence  rate  of  the  martingale 
convergence  theorem.  Wc  will  now  briefly  comment  on  our  results  in  these  areas. 

Let  (Q,  P)  be  a  probability  space.  We  take  a  filtration  of  a-algebras  to  be  any 
nondccrcasing  collection  of  c-subnlgcbras  of  ‘f  indexed  by  |0,  °°).  Let  { jF  t ■  t  >  0)  bo  a 
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filtration.  Define  7q  _  =  7q  ;  otherwise,  define  7  \  ^  7  $  and  7t+  =  7  s- 

s  <  t 

s  >  t 

We  say  that  a  filtration  is  continuous  at  t  if  7\  _  =  7i+  we  say  that  it  is  left  continuous  at  t 
if  7 -  =  7 1  t  and  we  say  that  it  is  right  continuous  at  t  if  =  7 *+  ■  The  filtration 
{ 7\ :  t  >  0}  is  continuous,  left  continuous,  or  right  continuous  if  it  is  continuous,  left 
continuous,  or  right  continuous,  respectively,  at  l  for  all  t  >  0.  Let  (X(t):  t  e  [0,  »)  j  be  a 
stochastic  process  defined  on  (SI,  7,  P).  By  a  P-null  set  we  mean  a  measurable  set  which 
has  probability  zero.  The  canonical  filtration  of  this  stochastic  process  is  given  by 
7 1  =  o(X(s):  s  <  t)  V  (P-null  sets)  for  t  >  0. 

In  the  context  of  estimation  theory  where  the  data  tire  represented  by  a  stochastic 
process  indexed  by  an  interval  of  real  numbers,  much  of  the  current  literature  is  concerned 
with  stochastic  differential  equations  and  with  martingale  theory.  Stochastic  differential 
equations  often  arise  as  models  for  stochastic  dynamical  systems  and  techniques  from 
martingale  theory  often  arise  in  the  analysis  of  estimation  schemes  and  their  approximation 
properties.  In  these  areas  one  often  encounters  hypotheses  stipulating  the  right  continuity 
of  filtrations  of  o-algebras  generated  by  stochastic  processes.  This  is  a  blanket  assumption 
made  by  many  in  the  French  and  Soviet  schools  of  stochastic  process  theory;  see,  for 
example,  [2],  f 3],  J4J,  and  [5 j.  however,  the  question  emerges  as  to  when  this 
assumption  is  justified  or  as  to  what  reasonable  hypotheses  might  imply  it.  It  is  often 
tempting  and  pleasing  to  the  intuition  to  believe  that  the  regularity  of  the  sample  paths  of  a 
stochastic  process  and  the  continuity  of  its  associated  canonical  filtration  are  closely  related. 
For  example,  separable  Brownian  motion  has  almost  surely  continuous  sample  paths  and 
with  the  aid  of  the  Blumenthal  Zero-One  Law  |6J  we  see  that  its  canonical  Filtration  is 
continuous.  Conversely,  martingales  with  respect  to  right  continuous  filtrations  have 
versions  that  arc  almost  surely  cadlag  J7  ].  If  we  hcuristically  think  of  the  canonical 
filtration  7 1  as  the  "data"  conveyed  by  the  stochastic  process  {X(t)t  t  e  {0,  °°)}  up  to  and 
including  time  t,  we  may  be  inclined  to  suppose  that  the  continuity  of  the  sample  paths  of 
the  process  might  prevent  jumps  in  the  "data"  [7\  ■  t  ^  0);  and  we  also  might  suppose  that 
the  continuity  of  the  "data"  flow  would  influence  the  regularity  of  the  sample  paths  of  the 
stochastic  process,  (in  [1 1  we  pointed  out  that  this  is  a  totally  erroneous  concept  of  data.) 

In  [S]  we  investigated  what  properties  characterize  filtration*  of  o-aigebras  that  are 
continuous.  In  this  work  we  showed  that  the  regularity  of  the  sample  paths  of  a  stochastic 
process  and  the  continuity  of  its  associated  filtration  are  logically  independent;  we  presented 
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an  example  of  a  stochastic  process  with  infinitely  differentiable  sample  paths  and  a 
discontinuous  canonical  filtration  and  we  also  gave  an  example  where  a  stochastic  process 
could  have  an  arbitrarily  irregularly  prescribed  sample  path  (e.g.  r.on-Lebesgue 
measurable)  and  a  continuous  canonical  filtration.  V/e  also  presented  an  example  of  a 
stochastic  process  whose  canonical  filtration  was  discontinuous  at  every  point.  We  then 
went  on  and  established  conditions  guaranteeing  the  continuity  of  a  filtration  of 
a-algebras.  Also,  wc  presented  necessary  and  sufficient  conditions  for  a  filtration  of  o- 
algebras  to  be  continuous,  right  continuous,  or  left  continuous.  For  example,  we 
established  the  following  results  in  [8]: 

Theorem:  Let  (Q,  J ,  P)  be  a  probability  space  and  { : l >  0)  be  a  filtration  on 
(T2,  J,  P)  so  that  ‘fo  contains  the  P-null  sets.  Then  the  Filtration  is  continuous  at  tQ  if  and 
only  if  for  all  Y  e  L-}(f2,  J,  P),  the  stochastic  process  defined  by  Yt  =  E(Y  I  Jt ),  t  >  0,  is 
L2  continuous  at  t0> 

Let  {Q,  J ,  P)  be  a  probability  space.  A  o-subalgebra  ^  of  J  is  said  to  be 
essentially  countably  generated  if  there  exists  a  countable  subset  K  of  J  so  that 
g(K)  V  (P-null  sets)  =  >1 V  (P-null  sets). 

Theorem:  Let  M  be  a  separable  metric  space  and  (X(t):  t  >  0}  be  a  stochastic 
process  taking  values  in  M  that  is  left  or  right  continuous  in  probability.  Then 
o(X(t):  t  >  0)  is  essentially  countably  generated. 

Theorem:  Let  (F2,  J,  P)  be  a  separable  probability  space.  Then  if  (Jt :  t  >  0)  is  a 
filtration  on  (Q,  J ,  P)  so  that  Jo  contains  the  P-null  sets,  there  exists  a  countable  subset 
C  of  R  so  that  for  t C,  J  _  =  Jl+  =  J[ . 

Now  wc  comment  upon  some  of  our  recent  results  on  martingales.  Frequently,  in 
estimation  theory  one  derives  a  sequence  of  estimators,  say  Yn,  and  one  desires  to  show 
that  as  n— Yn  converges  in  an  appropriate  sense.  A  typical  example  arises  in  an  attempt 
to  estimate  a  second  order  random  variable  X  as  a  function  of  the  available  data,  say 
{ Zn:  n  e  N),  by  choosing  Yn  =  E(  X  I  Zj,  Z2, . . . ,  Zn).  in  this  endeavor,  the  martingale 
convergence  theorem  often  surfaces  as  a  useful  tool  in  establishing  convergence. 

However,  in  a  practical  circumstance,  if  one  were  interested  in  convergence  and  if  n 
corresponded  to  t'v*  progression  of  time,  then  the  rate  of  convergence  would  also  be  of 
concern.  This  would  arise,  for  instance,  if  Yn  represented  the  estimate  after  n  samples  of 
data  are  taken  and  data  is  sampled  at  regularly  spaced  intervals.  The  key  to  establishing  this 
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rate  of  convergence  is  intimately  linked  with  the  convergence  rate  of  the  martingale 
convergence  theorem.  In  (9 j  we  examined  the  convergence  rate  of  the  martingale 
convergence  theorem,  and  we  showed  that  this  convergence  can  be  nonuniform  and, 
consequently,  arbitrarily  slow.  This  result  that  the  convergence  rate  of  the  martingale 
convergence  theorem  can  be  arbitrarily  slow'  is  important  not  only  from  the  obvious 
practical  viewpoint,  buf  also  from  the  viewpoint  of  the  mathematician,  since  the  martingale 
convergence  theorem  is  one  of  the  key  theorems  of  probability  theory. 

Another  aspect  of  the  martingale  convergence  theorem  which  we  investigated  was 
concerned  with  the  use  of  the  martingale  convergence  theorem  in  estimating  a  random 
variable  X.  Let  X  be  a  second  order  random  variable,  and  let  { Zn :  n  e  N}  be  a  sequence 
of  random  variables  representing  data.  Often  one  may  attempt  to  estimate  X  based  upon  the 
first  n  terms  of  the  data  sequence  by  H(  X  I  Zj,  Z2, ....  Z,>).  In  [10J  we  pointed  out 
some  problems  associated  with  an  overly  cavalier  usage  of  the  martingale  convergence 
theorem  in  this  context.  In  particular,  we  gave  an  example  where  each  of  the  above  random 
variables  was  zero  mean  Gaussian  with  a  positive  variance,  E(  X  1  Z\,  Z2, .  .  .  ,  Zn)  =  0 
almost  surely  for  each  11  e  N,  and  yet  for  any  positive  integer  k  there  exists  a  function 
f^rR— >R  so  that  fp(Yp)  =  X  pointwise  on  the  underlying  probability  space. 

In  a  similar  context  as  the  above,  in  111]  we  noted  that  for  a  second  order  random 
variable  X,  the  rate  of  the  I_2  convergence  of  E|X  I  Yj,  Y2, . . . ,  Yn]  can  crucially  depend 
upon  X.  That  is,  any  lo  perturbation  in  X  could  drastically  alter  the  rate  of  convergence. 

Another  aspect  of  estimation  theory  with  which  we  were  concerned  dealt  with  the 
idea  of  when  an  estimator  which  was  optimal  under  a  given  fidelity  criterion  would  also  be 
optimal  under  certain  other  fidelity  criteria.  A  classical  paper  on  this  subject  in  [12]  was 
written  by  Sherman,  and  this  result  is  known  in  the  engineering  literature  as  Sherman's 
theorem.  However,  a  close  inspection  of  [  12]  shows  some  erroneous  claims.  In  [13]  we 
presented  a  correct  derivation  of  the  effort  undertaken  in  [12],  The  following  theorem  is  a 
correct  version  of  Sherman's  theorem  and  we  proved  it  in  [13]. 

Theorem:  Let  k  e  N,  (Q,  S,  P)  be  a  probability  space,  and  X,  Yj, . .  .,Yj.  be  random 
variables  defined  on  (12,  S,  1J),  with  X  intcgrable.  Let  M:R^  —>  R  be  a  Borel  measurable 
function  such  that  M[Y j((o), .  .  ..Y^to)]  =  E|X  !  Yj, . .  ,,YjJ(o))  a.s.,  and  assume  that 
there  exists  a  regular  conditional  distribution  function  of  X  conditioned  on  a(Y  j, .  .  ..Y^), 
E:R  x  Q  — >  |(),1  ],  such  that  F(x+M[Yj(co),  .  .  .,Yj.(o))],(o),  as  a  function  of  x  with  co 
fixed,  is  unimodal  about  the  origin  and  symmetric.  Then  M|  Yj, . .  .,YjJ  minimizes  the 


quantity  E[d>(X-f(Y  j, .  .  ..Y^))]  over  all  Borel  measurable  functions  f:R  — »  R  where 
<I>:R  — >  [0,  «>)  is  even  and  nondecreasing  on  [0,  °°). 

Several  attempts  at  a  proof  of  the  above  result  have  been  presented  in  the  engineering 
literature,  and  each  that  we  know  of  is  wrong;  counterexamples  to  these  efforts  are  given  in 
[14]. 

Thus,  the  result  in  the  above  theorem  requires  a  regular  conditional  distribution 
function  that,  when  properly  shifted,  is  symmetric  and  unimodal  about  the  origin  and  a  cost 
function  that  is  nonnegative,  even,  and  nondecreasing  to  the  right  of  the  origin.  It  is  easy 
to  see  that  if  in  this  theorem  we  let  k=l  and  X  and  Y  be  mutually  Gaussian  random 
variables  then  the  resulting  regular  conditional  distribution  function  is  symmetric  and 
unimodal  about  E[XIYj(o))  for  any  fixed  Cx>.  This  special  case  explains  why  Sherman’s 
theorem  is  often  invoked  to  add  a  token  claim  of  generality  to  papers  that  only  consider 
Gaussian  distributions.  When  one  attempts  to  venture  outside  this  somewhat  limited  arena, 
however,  the  conditions  which  Theorem  1  places  on  the  regular  conditional  distribution 
function  immediately  begin  to  feel  overly  restrictive.  After  ail,  how  comfortable  should  we 
be  with  the  assumption  that  the  regular  conditional  distribution  function  under  consideration 
is  unimodal  about  the  conditional  mean?  The  conditions  on  the  cost  function,  on  the  other 
hand,  arc  extremely  nonrcstrictivc  and,  in  fact,  allow  for  many  interesting,  albeit 
impractical,  choices.  For  example,  the  cost  function  given  by 

Ixl 

r 

O(x)  =  j  l(p(t)  dt, 

0 

where  C  denotes  a  Cantor  subset  of  [0,  «>)  of  positive  Lebesgue  measure,  satisfies  the 
conditions  of  the  above  theorem.  This  imbalance  suggests  (lie  possibility  of  lessening  the 
restrictions  on  the  regular  conditional  distribution  function  by  perhaps  slightly  increasing 
the  restrictions  imposed  on  the  cost  function.  In  [14J,  we  presented  a  more  general 
treatment  of  this  general  subject.  The  following  results  are  presented  in  [14].  Notice  that 
the  first  result  dispenses  with  the  unimodality  assumption,  and  the  second  result  allows  us 
to  base  our  estimate  upon  random  variables  measurable  with  respect  to  a  non  countably 
generated  c-algcbra,  such  as,  for  instance,  that  which  may  be  generated  by  a  random 
object. 

Theorem:  Let  k  e  N,  (Q,  S,  P)  be  a  probability  space,  and  X,  Yj, . .  ..Y^  be  random 
variables  defined  on  (Q,  S,  P),  with  X  integrable.  Let  M:R^  — >  R  be  a  Borel  measurable 
function  such  that  M[Yj(co), .  .  .,  Y^fco)]  =  E|X  I  Yj, .  .  .,Y|J(a))  a.s.,  and  assume  that 
there  exists  a  regular  conditional  distribution  function  of  X  conditioned  on  cr(Yj, .  .  ..Y^), 
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F:R  xQ->  [0,1],  such  that  F(x+M[Y  j(co), .  .  .,Yj.((o)],a>),  as  a  function  of  x  with  to 
fixed,  is  symmetric.  Then  M[Y  j, . .  .,Y^j  minimizes  the  quantity  E[<I>(X-f(Y },...,  Y^.))j 
over  all  Borcl  measurable  functions  f:R^  — >  R  where  — »  [0,  °°)  is  even  and  convex. 


Theorem:  Let  (Cl,  S,  P)  be  a  probability  space,  A  be  a  <j-su  bulge  bra  of  S,  and  X  be  a 
random  variable  defined  on  (F2,  5,  P)  such  that  X  is  integiable.  For  each  co  e  Cl,  let 
M(co)  =  ElXW](co),  and  assume  that  there  exists  a  regular  conditional  distribution  function 
of  X  conditioned  on  91,  F:R  xCl—>  [0,1],  such  that  F(x+M(co),co),  as  a  function  of  x  with 

/N 

to  Fixed,  is  symmetric.  Then  M  minimizes  the  quantity  E[0(X-X)]  over  all  .T-measurable 
random  variables  X,  where  OtR  — »  [0,  is  even  and  convex. 


In  [15]  and  [16]  our  concern  was  directed  toward  fusing,  or  combining,  estimates 
based  upon  a  finite  number  of  estimates  of  a  fixed  second  order  random  variable  X  in  order 
to  achieve  a  single  “best”  estimate  of  X.  For  example,  if  X,  Yj,  Y^, . . . ,  Yn  are  random 
variables  and  X  is  second  order,  how  might  F[X  I  Yj],  E[X  I  Y2J, ....  E[X  I  Yn]  be 
combined  so  as  to  approximate  X  in  a  minimum  mean  square  sense?  Although  aspects  of 
this  problem  have  been  considered  in  the  literature,  we  know  of  no  other  work  in  this  area 
that  is  correct.  To  illustrate  some  subtleties  in  this  area,  note  the  following  two  examples. 


Example:  For  an  integer  n  >  1,  consider  a  set  of  random  variables  [X,  Yj, . . . ,  Yn)  with 
a  joint  probability  density  function  given  by 


(  15  V 

“ 

M  n  / 

f(x,  yi, .  .  .  ,yn)  =  y_)  exp 

-1 

2 

1  4-  X  CX;0 

-j-)  FI  (yi  exP 

1^1 

_ 

1  i=l  /. 

i=l 

_ 

It  follows  straightforwardly  that  the  set  {X,  Yj, . . . ,  Yn}  is  not  mutually  Gaussian  and 
not  mutually  independent,  yet  any  proper  subset  of  [X,  Yj, . . . ,  Yn)  containing  at  least 
two  random  variables  is  mutually  independent,  mutually  Gaussian,  and  identically 
distributed  with  each  random  variable  having  zero  mean  and  unit  variance.  For  any 
nonempty  proper  subset  CD  of  { Yj , . . . ,  Y{1 ) ,  we  note  that  E[X  I  ©]  =  0  a.s.  since  X  is 
independent  of  ©.  I  lowever,  it  follows  quickly  that 

BIX  I  Y, . Y„|  =  -L  y,.  •  •  Y„  cxp[=±(Yj  +  y|  +  . .  .  +  Y?)]  a.s. 

Thus,  since  any  Borel  measurable  function  of  the  estimates  E[X  I  CD\  where  ©ranges  over 

all  nonempty  proper  subsets  of  [  Y  j . Y„  j  would  be  constant  almost  surely,  it  would 

not  be  reasonable  to  attempt  to  estimate  E|X  1  Y  p  . . . ,  Yn]  based  on  a  combination  of 
these  estimates. 
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Example:  Let  Q  =  [0,  1],  ‘J  denote  the  3orel  subsets  of  £2,  and  P  denote  Lcbesgue 
measure  on  ‘J.  Let  A  be  a  positive  real  number,  o(Yj)  =  o([0,  1/2)),  c(Y7)  = 

0(11/4,  3/4)),  and  X(co)  =  A  for  to  e  [0,  1/4)  u  1 1/2,  3/4)  and  X(co)  =  -A  for 
co  e  [1/4,  1/2)  u  [3/4,  1].  Then  it  straightfonvardly  follows  that  H[XJ Y ^  J  =  E[XIY?]  =0 
a.s.,  but  E[XIY  j,  Y2I  =  X  a.s.  Notice  that  in  this  special  case,  any  linear  combination  of 
E[XIY j]  and  IZ[XI Y 2 ]  yields  an  estimate  equal  to  0  a.s.,  resulting  in  a  mean  square  error  in 
approximating  X  of  A  ,  which  can  exceed  any  preassigned  real  number.  Recalling  that 
E[X1Y jl  and  E[XIY2],  respectively,  are  o(Y immeasurable  and  c(Y2)-measurable,  we  see 
that  E[XIY  j]  =  E[X1Y2)  =  0  pointwise  in  co;  similarly,  we  see  that  E[X1Y  j,  Y-?]  -  X 

pointwise  in  co.  Thus,  in  this  situation,  it  is  fruitless  to  attempt  to  approximate  X  based  on 
any  function  of  E[XIY j]  and  E|XiYoj. 

In  [  1 5 J  and  [16J  we  proved  the  following  theorem. 

Theorem:  Consider  a  probability  space  (£2,  P)  and  random  variables  X,  Nj, . . . ,  Nn 

defined  on  (£2,  %  P)  where  n  is  a  positive  integer  and  X  is  a  second  order  random  variable. 
Further,  assume  that  for  each  positive  integer  i  <  n,  Nj  has  a  zero  mean  Gaussian 

distribution  with  positive  variance  given  by  cfp,  and  that  X,  Nj, . . . ,  Nn  are  muiuully 
independent.  Define  Yj  =  X  +  Nj  for  i  =  1, . . . ,  n.  Then  there  exists  a  Borcl 
measurable  function  g:Rn-*R  such  that  E|X!Yj, . . .  ,Y„]  =  g(E|XIY! J, .  . . ,  E[XIYn]) 
a.s. 


A  Monte  Carlo  variance  reduction  technique  known  as  importance  sampling  has 
recently  been  applied  to  many  problems  in  data  communications.  This  technique  holds  the 
oromise  of  offering  vast  improvements  to  traditional  Monte  Carlo  methods.  In  [17]  and 
[18]  we  considered  importance  sampling  applied  to  the  estimation  of  tail  probabilities.  In 
this  work  we  gave  counterexamples  to  some  commonly  used  types  of  importance 
sampling.  Then  we  introduced  a  new  method  of  importance  sampling,  which  we  called 
Importance  Sampling  via  a  Simulacrum,  and  we  illustrated  how  it  could  outperform  some 
other  methods  of  importance  sampling. 

In  other  papers  we  pointed  out  how  wrong  certain  commonly  accepted  techniques 
and  results  in  statistical  signal  processing  can  be.  In  [19]  wc  presented  a  collection  of 
counterexamples  in  detection  and  estimation.  In  [ 20]  wc  presented  a  collection  of 
counterexamples  in  conditioning.  In  [21],  wc  presented  a  collection  of  counterexamples  in 
maximum  likelihood  estimation.  In  [22]  and  ]23]  wc  presented  some  comments  on  some 
problems  in  Kalman  filtering.  The  papers  noted  in  this  paragraph  provide  several 
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counterexamples  to  what  is  often  taken  as  common  knowledge  in  the  literature  of  statistical 
signal  processing.  A  copy  of  [20]  is  appended  to  this  .eport. 

Another  direction  of  our  research  efforts  was  in  the  area  of  contention  resolution 
for  local  area  computer  networks.  In  t he  last  few  years,  packet  broadcasting  random 
multiple-access  computer  communication  networks  have  been  commercially  available.  A 
typical  example  of  such  a  network  is  the  Ethernet,  developed  by  Xerox,  which  was 
designed  based  on  the  idea  of  carrier  sense  multiple  access  with  collision  detection.  In 
Ethernet,  a  station  among  a  number  of  users  sharing  a  common  channel  will  listen  before 
transmitting  and  defer  if  the  channel  is  busy;  when  two  or  more  stations  collide,  each 
colliding  station  waits  for  a  random  period  of  time  before  retransmitting.  Although 
Ethernet  has  the  advantage  of  easy  interconnection  of  stations  to  the  common  channel  and  it 
pro\  ides  a  high  level  of  utilization  of  the  iiannel,  it  docs  not  truly  address  the  problem  of 
how  to  effectively  resolve  collisions  in  the  channel.  Thus,  a  packet  involved  in  a  collision 
may  incur  excessive  delay  due  to  waiting  and  abortion  of  transmission.  Recently,  a 
protocol  called  Enct  II  was  introduced  [24]  as  a  candidate  for  the  second  generation  of 
Ethernet.  This  protocol  is  designed  to  effectively  resolve  contention  in  a  broadcast 
multiple-access  network  such  as  Ethernet.  We  investigated  the  Enet  11  protocol  in  [25],  and 
in  this  investigation,  we  gave  expressions  for  the  average  time  required  to  resolv  a 
collision  involving  k  stations  and  (he  average  time  fora  particular  station  involved  in  a  k- 
way  collision  to  send  its  packet  successfully.  Our  results  in  this  area  were  derived 
analytically,  without  recourse  to  efforts  based  on  approximations  or  simulations.  In  [25] 
we  also  considered  the  efficiency  of  the  protocol,  and  we  derived  a  lower  bound  for  the 
maximum  efficiency. 

In  the  area  of  image  processing,  a  modest  effort  was  directed  toward  studying  the 
properties  of  a  data  compression  scheme  for  image  processing.  In  |26J  we  considered  a 
modification  of  an  existing  data  compression  scheme  which  allowed  more  general  ways  of 
processing  the  image  data  while  maintaining  the  favorable  data  compression  rates. 

We  also  devoted  some  effort  to  the  problem  of  signal  detection.  In  [27J  we 
studied  the  problem  of  choosing  the  nonlinearity  g(  ■  )  when  the  test  statistic  was 
constrained  to  be  of  the  form 


where  the  xj's  repiesented  our  observations.  Observe  that  in  the  case  of  a  constant  signal 
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adclitively  corrupted  by  mutually  independent,  identically  distributed  noise,  the  Neyman- 
Pearson  test  statistic  is  of  the  above  form.  If  the  noise  sequence  were  not  mutually 
independent,  then  the  test  statistic  would  not  necessarily  be  of  this  form.  However,  it 
might  seem  reasonable  to  suppose  that  in  some  cases,  if  the  noise  were  "almost  mutually 
independent"  then  a  test  statistic  of  the  above  form  might  be  a  reasonable  approximation  to 
an  appropriate  test  statistic.  In  [27  J  we  studied  the  problem  of  choosing  the  function  g(  • ) 
so  as  to  maximize  the  asymptotic  relative  efficiency  of  this  detector  relative  to  any  other 
detector  of  this  form  with  a  different  nonlinearity. 


In  [28  j  we  studied  another  aspect  of  statistical  hypothesis'  testing.  Consider  the 
situation  of  testing  one  simple  hypothesis  against  another  simple  hypothesis.  The 
likelihood  ratio  (i.c.  a  Radon-Nikodym  derivative)  often  arises;  and  it  is  known  that  in 
several  contexts  (e.g.  Neymau-Pcarson,  Bayes,  minimax)  an  optimum  test  is  given  by 
comparing  the  likelihood  ratio  against  an  appropriately  chosen  threshold.  In  [28]  we 
studied  the  question  of  when  a  likelihood  ratio  with  respect  to  two  probability  measures  Po 
and  Pi  might  also  be  the  likelihood  ratio  with  respect  to  another  pair  of  probability 
measures  Qq  and  Qj  on  the  same  measurable  space.  In  this  way,  one  likelihood  ratio 
might  implement  an  optimum  processing  of  the  data  for  many  pairs  of  probability 
measures;  that  is,  an  optimal  data  processor  might  be  optimal  even  when  different 
probability  measures  are  governing  the  data.  For  the  moment,  consider  the  case  where  Po 
is  absolutely  continuous  with  respect  to  Pj;  we  gave  examples  where  the  Radon-Nikodym 

derivative  was  the  likelihood  ratio  not  only  for  testing  Pq  against  Pj,  but  also  for 

testing  Qq  against  Qp  even  when  Pq  was  extremely  dissimilar  from  Qq  and  Pi  was 
extremely  dissimilar  from  Qp 


In  some  recent  efforts,  ;  have  ivesligated  some  aspects  of  linear  systems. 
Although  the  subject  of  linear  systems  lias  tally  matured  as  a  research  area,  we  have 
uncovered  some  unappreciated  aspects  of  the  theory.  In  [29J  (see  also  [30])  we 
investigated  the  representation  of  linear  systems,  in  this  work  we  established  the  following 
result. 


Theorem;  Let  Q  be  a  locall)  compact  separable  metric  space,  |i  be  a  a-finite  measure  on 

'£(Q)  (where  wc  use  ‘IK  ■ )  to  denote  the  Borel  subsets  of  a  topological  space),  and  A.  be  a 
Borel  measure  on  a  locally  compact  separable  metric  space  W.  Let 

TtLjo^n,  il(Q),  [l)— >L{(X.(W,  ‘B{ W),  X)  be  a  positive,  continuous,  linear  map.  Then 
there  exists  K:'.#(W)  x  il  — >  |(),  °°|  so  that 

(i)  For  each  0)  e  Q,  K(  ■  ,  co)  is  a  regular  Borel  measure  on  #(\V). 
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(ii)  For  each  E  e  1Z>(W),  K(E,  ■ )  is  measurable  on  £2. 

(iii)  For  each  A  e  'B{Q.)  with  ,u(A)  <  »,  the  measure 

J  K(  •  ,  co)  dja(co) 

A 

defined  on  'i>(W)  is  regular  and  ^-absolutely  continuous. 

(iv)  T(f)  =  ~~  J  K(  • ,  co)f+(co)  du(co)  - 

dX  J 
Q. 

<• 

^rj  K(  • ,  co)f-(co)  dji(co) 

Q 

for  f  e  L1  (O,  IXCl),  jo.),  where  by  this  notation,  we  mean  the  difference  of  the  Radon- 
Nikodym  derivatives  of  the  measures  given  by  the  integrals. 

Convolution  frequently  arises  in  the  study  of  linear  systems.  In  [31]  we 
constructed  two  bounded,  Lebesgue  integrable,  nowhere  zero  functions  whose  convolution 
is  identically  zero.  This  phenomenon  seems  to  have  been  overlooked  by  many  working  in 
the  area  of  linear  systems.  In  particular,  it  dashes  any  hope  of  deconvolution  in  this 
situation.  Also,  although  it  is  well  known  that  Lj(R),  equipped  with  the  operations  of 

pointwise  addition  and  convolution,  is  a  commutative  Banach  algebra,  this  result  shows 
that  this  commutative  Banach  algebra  Lj(R)  is  not  an  integral  domain.  Indeed,  it  shows 

much  more  than  this,  since  there  exist  two  nowhere  zero  integrable  functions  whose 
convolution  is  everywhere  zero.  In  [32]  we  showed  the  analogous  result  for  sequences. 
That  is,  we  showed  that  there  exist  two  summable,  nowhere  zero  sequences  whose 
convolution  was  identically  zero. 

This  has  been  a  brief  survey  of  our  accomplishments;  more  details  can  be  found  in 
the  indicated  publications.  These  accomplishments  further  our  understanding  of  many 
aspects  of  estimation  theory,  of  the  performance  of  a  contention  resolution  scheme  for  local 
area  computer  networks,  of  data  compression  for  image  processing,  of  signal  detection 
theory,  and  of  linear  system  theory. 
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ABSTRACT 

The  concept  of  conditioning  in  probability  theory  forms 
the  basis  for  study  in  many  areas  of  information  sciences  and 
systems.  Even  so,  the  subject  of  conditioning  is  often 
shrouded  in  heuristics,  misunderstood,  and  misused.  This 
paper  considers  several  aspects  of  conditioning  with  an 
emphasis  on  applications  and  explores  several  consequences 
of  an  overly  cavalier  approach  to  the  oft  neglected  measure- 
theoretic  subtleties  involved  in  this  area. 

L  INTRODUCTION 

Conditioning  in  probability  theory  is  a  widely  recurring 
concept  in  many  areas  of  information  sciences  and  systems. 
For  example,  conditioning  is  central  to  many  popular  tech¬ 
niques  in  applied  probability  and,  in  fact,  lies  at  the  heart  of 
many  aspects  of  estimation  theory.  In  spite  of  this  wide¬ 
spread  popularity,  the  subject  of  conditioning  is  commonly 
misunderstood  and  tools  involving  conditioning  are  fre¬ 
quently  misapplied.  To  rephrase  Doob  [5,  p.vj,  condition¬ 
ing  is  simply  a  branch  of  measure  theory,  and  no  attempt 
should  be  made  to  sugarcoat  this  fact  Unfortunately,  many 
efforts  at  research  have  apparently  been  undertaken  without 
appropriate  concern  for  the  measure-theoretic  subtleties 
associated  with  the  concept  of  conditioning.  In  this  paper 
we  review  several  aspects  of  conditioning  and  make  a  mod¬ 
est  anempt  to  suggest  caveats  which  seem  to  have  been 
frequently  overlooked  by  many  in  this  area.  Although  sev¬ 
eral  of  our  results  are  well  known  to  the  specialist  in  measure 
theory,  they  nevertheless  seem  to  have  been  overlooked  by 
many  working  in  information  sciences  and  systems. 

In  what  follows,  for  a  topological  space  T,  we  will  let 
3  (T)  denote  the  family  of  Borel  subsets  of  T.  Also,  we 
recall  that  a  subset  of  a  set  is  said  to  be  cocountable  if  its 
complement  is  countable.  Further,  for  a  subset  S,  we  will 
let  Sc  denote  the  complement  of  S,  and  we  will  let  Ig  denote 
the  indicator  function  of  S.  In  addition,  we  will  let  N  denote 
the  set  of  positive  integers,  Q  denote  the  set  of  rational  num¬ 
bers,  and  R  denote  the  set  of  real  numbers.  Finally,  for  a 
random  variable  X,  o(X)  will  denote  the  o-subalgebra 
generated  by  X. 

II.  SIGMA-ALGEBRAS 

The  topic  of  o-algcbras  is  basic  to  the  subject  of 
conditioning  since  conditioning  is  conventionally  taken  with 
respect  to  a  o-algcbra.  In  many  cases  the  o-algcbra  of 
interest  is  that  generated  by  some  random  variables 
representing  data.  Hence,  in  applications,  it  is  common  to 

treat  o-algebras  as  somehow  representing  knowledge  or 
information  associated  with  data.  Consider  the  following 
example  from  (2,  pp.458-459]  which  shows  that  associating 
o-algebras  with  knowledge,  or  information,  as  commonly 
understood,  can  lead  to  incorrect  conclusions. 
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Consider  the  probability  space  ([0, 1],  3  ([0, 1]),  X), 
where  X  denotes  Lebesgue  measure  on  3  ([0, 1]),  and 
consider  the  a-subalgebra  Q  given  by  the  family  of  all 
subsets  of  [0, 1]  which  are  either  countable  or  cocountable. 
Now,  for  B  e  2  ([0, 1]),  consider  the  conditional 
probability  P(B  I  Q).  Since  Q contains  all  singletons  (co), 
and  hence  might  be  seen  as  being  completely  informative,  an 
overly  cavalier  investigator  might  suppose  that  P(B  I  Q)  is 
equal  to  Ig.  In  other  words,  one  might  rationalize  that  to 

know  the  sets  in  Q  implies  that  one  knows  (0  itself  and  hence 
knows  whether  or  not  to  is  contained  in  B,  leading  to  the 

conclusion  that  P(B  I  Q)  should  be  one  when  co  is  contained 
in  B  and  zero  otherwise.  It  follows  trivially,  however,  from 
the  definition  of  conditional  probability,  that  P(B  I  Q)  = 
P(B),  except  possibly  off  of  a  countable  subset  of  [0, 1], 

For  another  example,  consider  a  probability  space 
(ft,  7,  P).  A  commonly  used  model  in  estimation  theory 
involves  the  model  of  data  as  a  filtration  { jFn:  n  e  N)  of 
O-subalgebras  of  7  Suppose  that  the  o-algebra  1  is 
separable;  that  is,  suppose  7  is  generated  by  a  countable 
family  of  subsets  of  ft.  Does  it  follow  that  7n  is  separable 
for  each  n?  As  the  following  example  illustrates,  O-sub- 
algebras  of  separable  o-algebras  need  not  be  separable. 

Assume  that  ft  =  [0, 1]  and  7  =  3  ([0,  1]).  Further,  let 
§  be  the  o-subalgebra  of  7 given  by  the  countable  and 
cocountable  subsets  of  [0, 1].  Since  7-  o((a,  b):  0  £  a  <  b 
<.  1  and  a,b  e  Q)  it  follows  that  7  is  separable.  Assume 
now  that  £  is  also  separable;  that  is,  assume  that 
o(An:  n  e  N)  where  An  is  a  subset  of  [0, 1]  for  each  n. 

Since  §  contains  only  countable  and  cocountable  subsets  of 
[0, 1],  we  may  assume  that  An  is  countable  for  each  n.  Let 

oo 

B  =  An,  and  note  that  B  is  also  a  countable  subset  of 

n=l 

[0, 1].  Hence,  there  exists  a  real  number  x  in  [0, 1]  which  is 
not  an  element  of  B.  Notice  also  that  if  2? is  the  family  of  all 
subsets  of  B  and  their  complements,  then  ©is  a  o- sub- 
algebra  such  that  (jz>  ©3  o(An:  n  e  N).  But,  ©*  £?  since 
{ x)  is  in  Q  but  not  in  ©.  This  contradiction  implies  that  Q  is 
not  separable  even  though  it  is  a  o-subalgebra  of  the 
separable  o-algebra  7- 

Now,  let  (Cl,  7)  be  a  measurable  space,  and  let  2 be  a 
family  of  probability  measures  on  (ft,  7)-  The  triple 
(ft,  7,!P)is  called  a  probability  structure.  If  S  is  a  o-sub- 
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algebra  of  7,  we  say  that  S  is  sufficient  if  for  each 
7  -measurable  bounded  real  valued  function  f  defined  on  ft, 
there  exists  an  S  -measurable  bounded  real  valued  function  g 

defined  on  ft  such  that  J"f  dP  =  Jg  dP  for  each  A  in  S  and 
A  A 

for  all  P  in  3.  That  is,  g  is  almost  surely  [P]  equal  to  the 
conditional  expectation  of  f  conditioned  on  5  when  P  is  the 
relevant  probability  measure.  Note  that  although  g  does  not 
depend  on  P,  the  set  of  P-measure  zero  might  depend  on  P. 
It  might  be  tempting  and  pleasing  to  the  intuition  to  suppose 

that  if  S  were  a  sufficient  o-subalgebra,  then  any  o-sub- 
algebra  of  7 which  included  S  as  a  subset  would  also  be 
sufficient.  The  following  example  from  [3]  constructs  a 
nonsufficient  o-subalgcbra  which  includes  a  sufficient 
o-subalgcbra. 

Let  3 denote  the  family  of  probability  measures  P  on 
(R,  3  (R))  such  that  P(B)  =  P(-B)  for  any  set  B  in  3  (R) 
where,  for  any  subset  B  of  R,  we  define  -B  = 

(»€  R  :-xe  B).  Let  A  =  (Be  3(R) :  B  =  -BJ  and 
note  that  A  is  a  o-subalgcbra  of  3  (R).  Further,  A  is  a 

sufficient  o-subalgebra  since,  given  any  bounded  Borel 
measurable  function  f,  g(x)  =  (f(x)  +  f(-x))/2  is  an 

^[-measurable  function  for  which  JfdP  =  Jg  dP  for  any 

A  A 

A  e  A  and  any  P  e  3. 

Suppose  now  that  Z  is  a  subset  of  R  which  contains  0 
and  for  which  Z  =  -Z.  Also,  define  ©  = 

(BuAiBt  3  (R),  BcZ,  and  A  e  A).  A  straight¬ 
forward  examination  shows  that  ©is  a  o-subalgebra  of 
3  (R)  which  includes  A. 

Assume  that  ©is  a  sufficient  o-subalgcbra  and  let  f  be  a 
bounded  Borel  measurable  function.  Then  there  exists  a 

©-measurable  function  g  for  which  JfdP=  JgdPforany 

D  D 

D  e  ©and  any  P  e  3.  Let  x  e  Z  and  note  that  {x]  e  ©. 
Choosing  D  =  { x }  above  then  implies  that  f(x)P({x})  = 
g(x)P((x))  for  any  measure  P  in  3.  Now  let  x  e  Zc  and 
note  that  {x,  -x)  e  ©,  (x)  <t  ©,  and  {— x J  «  ©.  Letting  D 
=  {x,  -x)  above  implies  that  (f(x)  +  f(-x))P({x})  = 
2g(x)P([x})  since  P(|x))  =  P((-x)),  by  definition  of  3, 
and  g(x)  =  g(-x),  since  g  is  ©  -measurable.  Given  any 
x  e  R,  there  exists  a  measure  P  in  3  for  which  P(  { x } )  >  0. 
Thus,  we  see  that  g(x)  =  f(x)  if  x  €  Z  and  g(x)  = 

(f(x)  +  f(-x))/2  if  x  e  ZC.  Let  f(x)  =  -1  if  x  <  0  and  f(x)  =  1 
if  x  £  0.  This  choice  for  f  implies  that  g,  as  defined  above, 
is  nonzero  on  Z  and  zero  on  Zc.  Hence,  we  have  that  Z 
=  (g~l((0})}c  e  ©.  Now  choose  a  subset  Zq  of  R  which 
contains  0,  is  such  that  Z0  =  -Z0,  and  which  is  not  an 
element  of  3  (R).  (Such  sets  abound.)  Substituting  Zq  for 
Z  thus  implies,  based  on  the  above  discussion,  that  Zq  e  D. 


But  Zq  cannot  be  in  ©  since  Zq  t  3  (R).  This  contradiction 
implies  that  ©  is  not  a  sufficient  o-subalgebra  even  though  it 
includes  the  sufficient  o-subalgebra  A. 

Filtrations  of  o-algebras  play  a  prominent  role  in  many 
areas  of  conditioning.  A  common  misconception  concerning 
filtrations  regards  the  relationship  between  the  regularity  of 
the  sample  paths  of  a  random  process  and  the  continuity  of 
its  canonical  filtration.  In  [9]  examples  are  given  in  which  a 
separable  random  process  with  a  continuous  filtration  has 
nonmeasurable  sample  paths,  a  random  process  with 
infinitely  differentiable  sample  paths  has  a  discontinuous 
canonical  filtration,  and  a  random  process  taking  values  in 
[0, 1]  has  a  canonical  filtration  which  is  everywhere 
discontinuous. 

HI.  CONDITIONAL  PROBABILITY 

Consider  a  subset  H  of  the  interval  [0, 1]  with  the 
properties  that  the  outer  Lebcsgue  measure  of  H  is  1  and  the 
inner  Lebesgue  measure  of  H  is  0.  (For  a  construction  of 
such  a  set,  the  interested  reader  is  referred  to  [8,  pp.67-70].) 

Further,  let  ft  =  [0,  1]  and  let  X  denote  Lebesgue  measure  on 
3  ([0, 1]).  Define  J  =  {(H  n  Bj)  u  (Hc  B2) :  Bj,  B2 
e  3  ([0, 1]))  and  note  that  ]Fis  a  o-algebra  on  ft  and  that 
3  ([0, 1])  is  a  O-subalgebra  of  7-  Now,  define  a  probability 
measure  P:7  — »[0, 1]  on  the  measurable  space  (ft,  7)  via 
P((H  nB])u  (Hc  n B2))  =  (X(B j)  +  a(B2))/2  to  obtain  a 

probability  space  (ft,  7,  P).  (That  P  is  well-defined  follows 
from  the  properties  of  H.) 

Consider  now  this  probability  space  (ft,  7  P)-  The 
following  example,  adapted  from  [2,  p.464,  33.13],  shows 
that  conditional  probabilities  need  not  be  measures. 

Since  P(H)  =  1/2  and  P(B)  =  X(B)  for  B  6  3([0,  1]) 
implies  that  that  P(H  n  B)  =  =  P(H)P(B),  it  follows 

that  H  is  independent  of  3  ([0, 1]).  Let  F  be  a  set  in  7  with 
probability  zero  and  assume  that  P(  •  I  3  ((0,  l]))(co)  is  a 
probability  measure  on  7  for  each  to  outside  of  the  null  set 
F.  Note  that  there  exists  a  collection  {An:  n  e  N)  of 
subsets  of  [0,  I]  such  that  3([0, 1])  =o((An:  n  e  N))  and 
such  that  {An:  n  e  N)  is  closed  under  finite  intersections. 
Define  Kn  =  (toe  ft:  P(An  I  3([0,  !]))(©)  =  lAn(®))  and 
note  that  Kn  €  3  ([0,  1  ])  and  P(Kn)  =  1  for  all  n  e  N  since 

OO 

P(An  I  3  ([0, 1)))  =  IA_  a.s.  Now,  let  K  =  O  Kj/M*  and 
“  n=l 

note  that  P(K)  =  1 .  Further,  note  that  the  function  which, 
for  a  fixed  to  in  K,  maps  an  element  B  of  3  ([0, 1])  to  Ijj((d) 
is  a  probability  measure  on  3  ([0, 1])  which  agrees  with 
P(B  I  3  ([0, 1  ]))(©)  whenever  B  €  (An:  n  e  N).  Thus,  the 
Dy nkin  system  theorem  [  1 ,  p.  1 69]  implies  that  for  ©  6  K, 
P(B  1 3  ([0, 1]))(©)  is  uniquely  determined  to  be  Igfto)  for 
any  set  B  in  3  ([0, 1]).  Thus,  in  particular,  if  ©  e  K  then 
P((©)  13  ([0,  !]))(©)  =  1.  Now,  recalling  we  assumed  that 
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P(  •  I  B  (10,  l]))(co)  is  a  probability  measure  on  7 for  each  co 
outside  of  the  null  set  F,  we  see  that  if  co  e  HpiK  then 
P(H  I  3  ([0,  l]))(co)2P((o>}  I2([0,  l]))(co)  =  Land  if 
co  e  HcrtK  then  P(I1 1  B  ([0,  l]))(co)  £ 

P((co)cl«([0,  l]))(co)  =  0.  Thus,  if  cue  K,  then 
P(H  I  B  ([0,  l]))(co)  =  IH(co).  But  H  and  3  ([0,  1])  are 
independent,  and  hence  P(H  I  'B  ([0, 1]))  =  P(H)  =  1/2  a.s. 
This  contradiction  implies  that  P(  •  I  3([ 0,  l]))(Co)  is  not 
almost  surely  a  probability  measure  on  7-  Hence,  a 
conditional  probability  is  not  necessarily  a  measure. 

A  regular  conditional  probability  allows  one  to  sidestep 
many  of  the  undesirable  aspects  of  conditional  probability 
since  a  regular  conditional  probability  is  by  definition 

required  to  be  a  measure  for  each  fixed  co  e  £2.  Unfor¬ 
tunately,  however,  regular  conditional  probabilities  do  not 
always  exist  In  fact,  the  situation  detailed  above,  in 
addition  to  showing  that  a  conditional  probability  need  not  be 
a  measure,  also  provides  an  example  in  which  a  regular 
conditional  probability  docs  not  exist. 

IV.  CONDITIONAL  INDEPENDENCE 

The  concept  of  conditional  independence  arises 
frequently  in  many  aspects  of  probability  theory.  For 
example,  the  concept  plays  an  important  role  in  the  study  of 
Markov  processes.  Unfortunately,  misconceptions  often 
arise  regarding  the  relationship  between  conditional 
independence  and  independence.  As  the  following  examples 
adapted  (with  a  correction)  from  [4,  p.221]  indicate,  the 
notions  of  independence  and  conditional  independence  taken 

with  respect  to  a  nontrivial  o-subalgebra  are  unrelated. 

Consider  a  probability  space  (£1,  7,  P)  and  a  o-sub- 
algebra  Xof  J.  Further,  let  X\  and  X2  be  two  families 
each  composed  of  elements  from  7-  The  families  9(\  and 
Xi 316  said  to  be  conditionally  independent  given  X  if 
P(Aj  n  A2  I  #)  =  P(Ai  I  X)  P(A2  I  X)  a.s.  for  all 
A  i  e  X\  and  A2  €  Further,  two  random  variables  X 
and  Y  defined  on  (£1,  J,  P)  are  said  to  be  conditionally 
independent  given  X  if  o(X)  and  o(Y)  are  conditionally 
independent  given  X 

Let  X  j  and  X2  be  two  independent  identically  distributed 
random  variables  such  that  P(Xj  =  1)  =  P(X  i  =  -1)  =  1/2. 
Further,  let  Z  =  Xj  +  X2,  and  let  A;  =  Xi~I({  1 ))  for  i  =  1 
and  2.  In  this  case,  P(Aj  I Z)  =  1/2  on  Z~l({0))  for  i  =  1  or 
2,  and  P(Aj  n  A2  I Z)  =  0  on  Z“I({0|).  In  particular, 

P(Aj  n  A2  I Z)  *  P(Ai  I Z)  P(A2 1 Z)  on  an  event  of  positive 
probability.  Thus,  the  independent  random  variables  Xj  and 

X2  are  not  conditionally  independent  given  o(Z). 

Consider  now  three  mutually  independent  random 
variables  Yj,  Y2,  and  Y3  such  that  each  random  variable 
takes  on  only  integer  values  and  such  that  P(Y,  =  m)  <  1  for 
all  integers  m  and  for  i  =  1, 2,  3.  Further,  let  S2  =  Yj  +  Y2 
and  S3  =  Yj  +  Y2  +  Y3  and  notice  that  Y 1  and  S3  arc 


dependent  random  variables.  Let  Bj  =  S2~I((i))  for  each 
integer  i.  There  exists  k  such  that  B^  has  positive  proba¬ 
bility.  On  such  a  set  Bu  we  have  that  P(Y  1  =  i,  Sn  =  i  I  S2 ) 
=  P(Y1=i,S2  =  k,S3=j)/P(S2  =  k)1 
=  P(Y1=i)P(Y2  =  k-i)P(Y3=j-k)/P(S2  =  k) 

=  (P(Y )  =  i )  P(Y2  =  k  -  i  )/P(S2  =  k ))  P(Y3  =  j  -  k ) 

=  (P(Y !  =  i .  S2  =  k )  /  P(S2  =  k  ))  P(Y3  =  j  -  k ) 

=  P(Y1=ilS2)P(Y3=j-k) 

=  P(Y]  =  i  I S2 )  (P(Y3  =  j  -k )  P(S2  =  k  )/P(S2  =  k )) 

=  P(Y  j  =  i  I  S2  )  (P(S2  =  k,  S3  =  j )  /  P(S2  =  k  )) 

=  P(Yj  =i  I  S2  )  P(S3  =  j  I  S2 ).  Thus,  we  conclude  that 
even  though  Y  j  and  S3  are  dependent  random  variables,  Y  ] 

and  S3  are  conditionally  independent  given  0(82). 

V.  CONDITIONAL  EXPECTATION 

Let  X  be  an  integrable  random  variable  defined  on  the 
probability  space  (£2,  7,  P),  and  let  X  be  a  o-subalgebra  of 
7.  Then  can  the  conditional  expectation  E[X  1  X]  be 

expressed  as  Jx  dP(  ■  I  X),  where  P(  •  I  X)  denotes 

a 

conditional  probability  given  Xl  The  alert  reader  will 
immediately  give  a  negative  response  to  this  question,  since, 
recalling  Section  III,  P(  •  1  X)  might  not  be  a  measure  and 
hence  the  preceding  integral  might  not  even  be  defined. 

The  following  example  counters  a  common  misconcep¬ 
tion  concerning  versions  of  conditional  expectations.  In 
particular,  a  random  variable  is  given  which  is  equal  a.s.  to  a 
conditional  expectation  yet  is  not  a  version  of  the  conditional 
expectation. 

Consider  the  probability  space  consisting  of  [0,  1], 

B  ([0,  1J),  and  Lebesgue  measure  on  B  ([0, 1]),  and  let  Q 

denote  the  o-algebra  consisting  of  the  countable  and 
cocountable  subsets  of  [0, 1].  Let  X  be  the  identity  map  on 
[0, 1]  and  note  that  E[X  1  Q]  =  1/2  a.s.  Further,  let  Y  = 

2  (1  ~  Ic>  where  C  denotes  the  Cantor  ternary  set.  Note  that 

Y  =  E[X  I  Q  ]  a.s.,  yet  Y  is  not  §  -measurable  (since  C  is 
neither  countable  nor  cocountable)  and  hence  is  not  a  version 
of  E[X  I  (7  ]. 

Another  commonly  occurring  misconception  regarding 
conditional  expectation  is  that  it  is  a  "smoothing”  operator. 

Consider,  for  example,  a  random  process  (X(t)  :te  R) 
defined  on  a  probability  space  (£2,  7,  P)  and  a  o-subalgebra 
Xof  7-  It  has  been  argued  by  some  (see  for  instance 
several  recent  papers  in  the  area  of  perturbation  analysis)  that 
E[X(t)  I  X)  is  "smoother”  than  X(t)  as  a  function  of  t.  To 
dispel  this  absurd  notion  simply  let  X(t)  be  an 
^-measurable  random  process  which  is  discontinuous 
everywhere;  the  version  of  E[X(t)  I  X]  given  by  X(t) 
obviously  retains  this  same  property. 

Perhaps  a  little  less  obvious  is  the  fact  that,  for  a  random 

variable  X  on  (£2,  7,  P)  and  a  o-subalgebra  £of  7, 

E[X  I  Q  J  need  not  be  as  "smooth”  a  function  of  co  as  X. 
Consider  for  instance  the  probability  space  given  by  the 
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interval  [0, 1],  the  a-algebra  Q  given  by  the  countable  and 
accountable  subsets  of  [0,  1],  and  Lebesgue  measure  on  Q. 
If  we  let  X  =  1 ,  then  a  version  of  E[X  I  (j  ]  is  given  by 
1  -  Ig  where  B  equals  the  set  of  rationals  in  [0, 1],  Hence, 
even  though  X  is  everywhere  continuous,  there  exists  a 
version  of  E(X  I  (j  ]  which  is  everywhere  discontinuous. 

A  commonly  encountered  property  of  conditional 
expectation  is  the  so-called  nesting  property.  Unfortunately, 
this  property  is  sometimes  misapplied.  In  this  example, 
from  [6],  it  is  shown  that  E[E[X  I  Y]]  may  exist  even  when 
the  expectation  of  X  does  not  exist.  In  other  words,  before 
calculating  E[E[X  I  Y]J  and  claiming  one  has  found  the  mean 
of  X,  it  is  necessary  to  first  ascertain  that  the  mean  of  X 
actually  exists. 

Consider  random  variables  X  and  Y  defined  on  the  same 
probability  space  such  that  Y  possesses  a  probability  density 

function  g(y)  given  by  g(y)  =  —  W-exp(^[ ;  y  >  0,  and,  for 

V  2ity  '  2  / 

each  y  >  0,  Y  is  such  that  a  conditional  density  function  of  X 
given  Y  =  y,  denoted  by  f(xly),  exists  and  is  given  by 

f(xly)  =  exp|-^— | ;  y  >  0  and  x  e  R.  It  follows 

immediately  that  E[X  1  Y]  =  0  as.  and  therefore  E[E[X  1  Y]] 

=  0.  Notice,  however,  that  the  mean  of  X  does  not  exist 
since  X  has  a  Cauchy  density  h(x)  given  by  h(x)  = 

J  f(xly)  g(y)  dy  = - for  x  €  R. 

£  Jt  (1  +  x2) 

For  another  example,  consider  random  variables  X  and 
Y  each  defined  on  the  same  probability  space  (O,  rJ,  P)  and 
a  o-subalgebra  JVf of  ‘J.  Another  commonly  encountered 
misconception  concerning  conditional  expectation  is  that 
E[X  I  M ]  and  E[Y  I  SVf  J  are  independent  if  X  and  Y  are 
independent.  The  following  counterexample,  which 
[12,  p.133]  attributes  to  C.  Sugahara,  demonstrates  that  in 
general  this  conclusion  is  false. 

Let  U  and  V  be  independent  random  variables,  each 

defined  on  the  probability  space  (il,  jF,  P),  and  each  having 
a  zero  mean,  unit  variance  Gaussian  distribution.  Define  X 
=  U  +  V  and  Y  =  U  -  V,  and  note  that  X  and  Y  arc 
independent  random  variables  each  having  a  zero  mean 
Gaussian  distribution  with  a  variance  of  2.  Further, 

E[X  I  o(U)J  =  E[U  +  V  I  o(U)]  =  U  +  E[V  I  o(U)]  = 

U  +  E[V]  =  U  a.s.,  and  E[Y  I  o(U)J  =  E[U  -  V  I  o(U)]  = 

U  -  E[V  I  o(U)]  =  U  -  E[V]  =  U  a.s.  Hence,  any  version  of 

E[X  I  C(U)J  and  any  version  of  E[  Y 1  o(U)J  are  equal  almost 
surely  to  the  same  positive  variance  Gaussian  random 
variable  and  hence  cannot  be  independent  Further,  we  note 
that  even  the  ubiquitous  Gaussian  assumption  docs  not 
alleviate  this  problem. 

Fatou’s  lemma  and  uniform  integrability  arc  powerful 
tools  in  analysis  and  are  often  relied  upon  in  the  area  of 
estimation  theory.  We  recall  that  if  a  sequence  of  random 
variables  is  uniformly  intcgrable  then  almost  sure 
convergence  implies  convergence  of  the  corresponding 
expectations.  Convergence  of  conditional  expectations  with 

respect  to  an  arbitrary  o-subalgebra,  however,  does  not 
follow  in  general.  The  following  example,  adapted  from 
[16],  describes  a  situation  in  which  Fatou’s  lemma  docs  not 
hold  and  in  which  uniform  integrability  and  almost  sure 


convergence  do  not  imply  that  the  corresponding  conditional 
expectations  converge. 

Let  Q  =  (0, 1)  x  (0,  1),  let  [B  x  (0, 1):  B  e 
2!  ((0,  1))),  and  note  that  2/is  a  a-algebra  on  Q.  Let  |i 
denote  Lebesgue  measure  on  0  ((0, 1)),  and  let  P  denote 
Lebesgue  measure  on  0  ((0,  1)  x  (0,  1)).  For  each  positive 
integer  n,  let  Bn  =  (0,  ^),  and  let  An  denote  the  n-th  term  in 
the  sequence  (0,  !•),  (£,  1),  (0,  1),  (|,  *-),  (£,  2.),  (2-,  1), 
(0,  J-),  (£,  1).  -.  Note  that  (G.  0  ((0,  1)  x  (0,  1)),  P)  is  a 

probability  space,  and  that  2/  is  a  o-subalgebra  of 

0  ((0, 1)  x  (0,  1)).  Now,  for  each  positive  integer  n,  define 

a  random  variable  Xn(x,  y)  =  -  *  IA  x  Bn(x*  y)*  Lct 
■W»n)  n 

B  e  0  ((0,  1)),  and  note  that 

XndP=  — J — Ia  v  n  dP 

J  n  J  „rBn)  An x  »n 

Bx(0,  1)  Bx(0,  l)w  n; 

=  p(An  n  B)  =  I  lAn  x  (0, 1)  dP,  which  thus  implies 
Bx(0,  1) 

that  E[Xn  \  J{\  =  IA  x  (0,  1)  as-  Now,  note  that  Xn  £  0 

for  each  positive  integer  n,  and  that  Xn  -+  0  as  n  — > «.  Note 

also  that,  since  E[IXnl]  =  p(An)  0,  the  random  variables 

[Xn:  n  £  N)  are  uniformly  integrable.  Further,  note  that 

Sn  !a„  x  (0, 1)  =  1  and  that  hm  IAfl  x  (0, 1)  =  0.  Thus, 
n— n— 

we  see  that,  even  though  the  random  variables  (Xn:  n  e  N) 
are  uniformly  integrable,  lim  E[Xn  I  9(]  =  1  a.s.  and 

lim  E[Xn  I  J{]  =  0  a.s.  In  particular,  Fatou ’s  lemma  does 
n— »<*> 

not  hold  and  the  conditional  expectations  do  not  converge. 

VI.  REGRESSION  FUNCTIONS 

Given  two  random  variables  X  and  Y  defined  on  the 
same  probability  space,  a  common  problem  concerns  the 
determination  of  the  form  of  the  regression  function 
E[X  I  Y=y],  For  example,  [13]  considers  this  problem  when 
both  X  and  Y  are  uniformly  distributed.  In  this  example,  we 
show  that  the  existence  of  a  joint  probability  density  function 
for  X  and  Y  in  no  way  guarantees  that  the  regression 
function  will  obey  any  regularity  property,  other  than  Borcl 
measurability. 

Let  g:R  — >  R  be  Borel  measurable  and  define  f(x,y)  = 

^  exp(-exp(!yl)  lx  -  g(y)l).  Note  that  f(x,y)  is  a  joint 

probability  density  function  since  J  J  f(x,y)  dx  dy 

R  R 

=  I  I  4  exjK-expOy!)  lx  -  g(y)l)  dx  dy 


=  JJ  \  exp(-exp(lyl)  Iz!)  dz  dy  =  j  ^-exp(-lyl)  dy  =  1. 


Let  X  and  Y  be  random  variables  such  that  the  pair 
(X,  Y)  has  a  joint  density  function  given  by  f(x,  y).  Notice 
from  the  above  calculation  that  a  second  marginal  density  of 
f(x,  y)  is  given  by  fy(y)  =  i  exp(-lyl).  Recalling  that 


['=yi  =  J 


f(x  y) 

x  —  -  dx  and  substituting  for  fv(y) 

fy(y) 


implies  that  E[X  1  Y=y] 

=  2  exp(lyl)  J  *-exp(-cxp(lyl)  lx  -  g(y)l)  dx 

R 

=  2  exp(lyi)  J  ((z  +  g(y))  1  exp(-e.xp(lyl)  Izl)  dz 

R 

=2exp(lyl)e(y)24w) =8(y)- 

Hence,  the  random  variables  X  and  Y  with  the  joint  density 
function  f(x,  y)  arc  such  that  E[X  I  Y=y]  =  g(y)  where  we 
recall  that  g(  • )  was  an  arbitrarily  selected  Borel  measurable 
function. 


VII.  MEAN  SQUARE  ESTIMATION 

One  of  the  most  common  misconceptions  in  estimation 
theory  is  that  conditional  expectation  minimizes  mean  square 
error.  This  mistaken  concept  arises  in  estimation  and  filter¬ 
ing  applications  in  engineering  as  well  as  in  many  L2  mini¬ 
mization  problems  in  probability  and  statistics.  As  the  fol¬ 
lowing  example  from  [15]  indicates,  even  for  bounded  ran¬ 
dom  variables,  conditional  expectation  may  not  even  come 
close  to  minimizing  the  mean  square  error  even  though  there 
exists  a  function  mapping  the  reals  into  the  reals  by  which 
the  random  variable  of  interest  may  be  estimated  precisely. 

Consider  the  set  H  and  the  probability  space  (fi,  P) 
used  in  Section  III.  Let  X  denote  Lebesgue  measure  on 
®([0,  1]).  Further,  let  A  be  a  fixed  nonzero  real  number 
and  define  two  random  variables  X  and  Y  on  (Q,  7,  P)  via 
X(co)  =  (1)  and  Y(w)  =  A  I{i(co).  Notice  that  o(X)  = 

‘B  ([0,  1])  and  that  o(Y)  =  (Q,  0,  H,  Hc).  Further,  since 
P(H)  =  1/2  and  P(B)  =  *(B)  for  Be®  ([0,  1  ]),  we  see  that 
P(H  n  B)  =  X(B)/2  =  P(H)P(B),  or  that  X  is  independent  of 
Y.  Hence  E[Y  I X]  =  E[Y]  =  A/2  a.s.  which  implies  that 
E[(Y  -  E[  Y I  XJ)2]  =  E[(Y  -  A/2)2]  ^  A2/4.  But,  Y (eo>  = 

A  IH(X(cu))  for  all  to  e  Q.  Thus,  E[(Y  -  A  IH(X))2]  =  0. 
In  other  words,  for  this  example  there  exists  a  function 
f:R->R  such  that  Y(a>)  =  f(X(co))  for  all  0)  €  12  yet,  by 
choice  of  A,  E[(Y  -E[Y  1 X])2]  could  be  arbitrarily  large. 

We  note  further  that  in  this  case  0(Y)  is  finite,  0(X)  contains 
all  singletons,  and  all  moments  of  X  and  Y  exist 

VIII.  DISTRIBUTED  ESTIMATION 


Consider  a  random  variable  X  and  a  set  of  random 
variables  ( Y  j .....  Yn }  all  defined  on  the  same  probability 


space.  A  commonly  considered  problem  in  the  area  of 
distributed  estimation  is  that  of  how  to  best  fuse  or  combine 
estimates  of  the  form  E[X  1  CD],  where  ©  is  a  nonempty 
proper  subset  of  { Yj, . . . ,  Yn],  in  order  to  obtain  a  single 

good  estimate  of  E[X  I  Y 1 . Yn],  In  the  following 

example,  from  [7],  a  situation  is  described,  using  common 
distributions,  in  which  any  such  method  of  fusion  is  useless. 

For  a  positive  integer  n  greater  than  one,  consider  a  set 
of  random  variables  [X,  Yj, . . . ,  Yn]  with  a  joint 
probability  density  function  given,  as  in  [1 1],  by 

f(x>  yi . v  =  +1  «P  y|x2  +iy?j 

•  1  +  x  exp|-^-J  ]~J  jy,  expj^-JJ  .  It  follows  straight- 

L  i=l  J 

forwardly  that  the  set  [X,  Yi . Yn)  is  not  mutually 

Gaussian  and  not  mutually  independent,  yet  any  proper 
subset  of  (X,Y1,...,Yn}  containing  at  least  two  random 
variables  is  mutually  independent,  mutually  Gaussian,  and 
identically  distributed  with  each  random  variable  having  zero 
mean  and  unit  variance.  For  any  nonempty  proper  subset  CD 
of  [Yj, . . . ,  Yn],  we  note  that  E[X  I  27]  —  0  a.s.  since  X  is 
independent  of  2).  However,  that  E[X  I  Yj . Yn]  = 

r-L  Y.-  •  •  Y„  exp|-y  (Yj  +  y|  + . . .  +  Y/f)  a.s.  follows 
easily.  Thus,  since  any  Borel  measurable  function  of  the 
estimates  E[X  I  23]  where  CD  ranges  over  all  nonempty 
proper  subsets  of  (Yj, . . . ,  Yn)  would  be  constant  almost 
surely,  it  would  be  absurd  to  attempt  to  estimate 
E[X  I  Yj, . . . ,  Yn]  based  on  a  combination  of  these  esti¬ 
mates.  Once  again,  notice  that  the  oft  used  and  much  abused 
Gaussian  assumption  docs  not  alleviate  this  difficulty. 

IX.  MARTINGALES 

The  subject  of  martingale  theory  is  an  important  aspect  of 
conditioning  which  finds  many  applications  in  information 
sciences  and  systems.  The  following  example  shows  that  a 
martingale  may  have  a  constant  positive  mean,  converge  a.s. 
to  zero  in  finite  time,  and  yet  with  positive  probability  exceed 
any  real  number. 

Let  (Xn:  n  e  N]  be  a  sequence  of  mutually  independent 
identically  distributed  random  variables  such  that  P(Xj  =  0) 

=  P(Xj  =  2)  =  1/2  .  Now,  for  each  positive  integer  n, 

define  Yn  =  XjX2  •  •  •  Xn,  and  note  that  [  Yn:  n  e  N)  is  a 
martingale  and  that  E[Yn]  =  1  for  all  n  e  N.  Further,  notice 

that  not  only  does  the  sequence  ( Yn:  n  e  N)  converge 
almost  surely  to  zero,  but  with  probability  one,  only  a  finite 
number  of  terms  of  the  sequence  { Yn:  n  e  N)  arc  nonzero. 
Even  so,  it  follows  easily  that  Yn  exceeds  any  real  value 
with  positive  probability  since  P(Yn  =  2n)  >  0  for  all  n  6  N. 

Consider  now  the  following  example  from  [14]  which 
illustrates  a  pathology  concerning  the  martingale  conver¬ 
gence  theorem.  In  particular,  it  shows  that  in  certain 
circumstances  the  martingale  convergence  theorem  might  be 
useless  as  an  estimation  technique. 
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Consider  the  probability  space  (R,  “B  (R),  P)  where  P 
denotes  zero  mean,  unit  variance  Gaussian  measure  on 
(R,  3  (R)).  Let  P*  denote  the  inner  P  measure  on 

(R,  “3  (R)).  Let  S  be  a  subset  of  R  such  that  P*(S)  =  P*(SC) 
=  0.  (That  such  sets  exist  is  shown  in  [14].)  Further,  let  SAf 
=  ((S  r>Bj)  u  (Sc  n  B2):  Bj,  B2  e  tB(R)}  and  note  that 
rW  is  a  O-algebra  on  R  which  includes  <B  (R).  Define  a 
probability  measure  |i  on  (R,  %>)  via 
p((S  n  Bj)  u  (SC  A  B2))  =  (P(B  j)+P(B2))/2.  (That  p  is 
well-defined  follows  from  the  properties  of  S.)  Note  that  the 
restriction  of  p  to  3  (R)  is  P. 

Consider  now  the  probability  space  (R,  tW,  p).  Note 
first  that  S  and  Sc  are  each  independent  of  3  (R)  since,  for 
any  Borel  set  B,  p(S  n  B)  =  p(B)/2  =  p(S)  p(B)  and 
p(Sc  n  B)  =  p(B)/2  =  p(Sc)  p(B).  Now  define  a  random 
variable  X  on  (R,  Tf',  p)  via  X(x)  =  x  Is(x)  -  x  Igc(x)  and 
notice  that,  for  any  Borel  set  B,  p(X  e  B)  = 
p((S  nB)u  (Sc  n  [x€  R:-xe  B )))  =  P(B)  since  P  is 
symmetric.  Hence,  X  is  a  Gaussian  random  variable  with 
zero  mean  and  unit  variance.  Further,  note  that 
E[X(x)  I  3  (R)]  =  x  E[lS(x)  -  ISc(x)  1  3  (R)]  = 
x  E[Is(x)  -  Igc(x)]  =  0  a.s.  since  the  identity  map  is  Borel 
measurable,  S  and  Sc  are  independent  of  2J  (R),  and  P(S)  = 
P(SC)  =  1/2. 

Now,  let  (Yk:  k  e  N)  be  any  sequence  of  Borel 
measurable  functions  mapping  R  into  R.  Note  that 
( Yk:  k  e  N}  is  a  sequence  of  random  variables  on 
(R,  CW ,  p).  Consider  the  martingale 
(Xk  =E(X  I  Y], . . . ,  YjJ  :  k  e  N).  Since,  given  any 
k  e  N,  3  (R)  includes  o(Yj, ....  Yk),  it  follows  that  Xk 

=  E[X  I  Y] . Yk]  =  E(  E[X  I  tB(R)]  I  Yj, ....  Yk]  =  0 

a.s.  using  the  previous  result.  Hence,  the  martingale 
convergence  theorem  is  completely  useless  in  estimating  the 
random  variable  X  in  terms  of  the  random  variables 
{ Yk:  k  e  N).  Furthermore,  note  that  for  any  sequence 

(sk:  k  e  N)  of  positive  real  numbers,  we  could  let  Yk(x)  = 
sk  x.  In  this  case,  the  above  phenomena  is  exhibited  when 
all  of  the  random  variables  of  concern  are  Gaussian.  Finally, 
we  note  that,  yet  again,  the  ubiquitous  Gaussian  assumption 
does  not  protect  us  from  this  disturbing  problem. 

Another  disturbing  result  concerning  the  martingale 
convergence  theorem  is  detailed  in  [  10].  There  it  is  shown 
that  the  convergence  rate  guaranteed  by  the  martingale  con¬ 
vergence  theorem  can  be  arbitrarily  slow.  This  result  con¬ 
trasts  with  the  previous  example  in  which  the  convergence 
was  instantaneous,  yet  to  the  wrong  random  variable. 

X.  CONCLUSION 

We  hope  these  comments  will  be  helpful  to  those  using 
conditioning  as  a  tool  in  investigations.  Although  some  of 
these  examples  are  undoubtedly  well  known  to  the  specialist 
in  measure  theory,  as  previously  mentioned,  our  experience 
indicates  that  these  caveats  have  been  overlooked  by  many 


working  in  the  area  of  information  sciences  and  systems.  In 
conclusion,  if  this  paper  serves  no  other  purpose,  we  hope  it 
will  serve  as  a  reminder  that  conditioning  can  be  a  dangerous 
tool  in  the  hands  of  amateurs. 
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